10 research outputs found
Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation
Generative models for 3D geometric data arise in many important applications
in 3D computer vision and graphics. In this paper, we focus on 3D deformable
shapes that share a common topological structure, such as human faces and
bodies. Morphable Models and their variants, despite their linear formulation,
have been widely used for shape representation, while most of the recently
proposed nonlinear approaches resort to intermediate representations, such as
3D voxel grids or 2D views. In this work, we introduce a novel graph
convolutional operator, acting directly on the 3D mesh, that explicitly models
the inductive bias of the fixed underlying graph. This is achieved by enforcing
consistent local orderings of the vertices of the graph, through the spiral
operator, thus breaking the permutation invariance property that is adopted by
all the prior work on Graph Neural Networks. Our operator comes by construction
with desirable properties (anisotropic, topology-aware, lightweight,
easy-to-optimise), and by using it as a building block for traditional deep
generative architectures, we demonstrate state-of-the-art results on a variety
of 3D shape datasets compared to the linear Morphable Model and other graph
convolutional operators.Comment: to appear at ICCV 201
Dynamic Neural Portraits
We present Dynamic Neural Portraits, a novel approach to the problem of
full-head reenactment. Our method generates photo-realistic video portraits by
explicitly controlling head pose, facial expressions and eye gaze. Our proposed
architecture is different from existing methods that rely on GAN-based
image-to-image translation networks for transforming renderings of 3D faces
into photo-realistic images. Instead, we build our system upon a 2D
coordinate-based MLP with controllable dynamics. Our intuition to adopt a
2D-based representation, as opposed to recent 3D NeRF-like systems, stems from
the fact that video portraits are captured by monocular stationary cameras,
therefore, only a single viewpoint of the scene is available. Primarily, we
condition our generative model on expression blendshapes, nonetheless, we show
that our system can be successfully driven by audio features as well. Our
experiments demonstrate that the proposed method is 270 times faster than
recent NeRF-based reenactment methods, with our networks achieving speeds of 24
fps for resolutions up to 1024 x 1024, while outperforming prior works in terms
of visual quality.Comment: In IEEE/CVF Winter Conference on Applications of Computer Vision
(WACV) 202
3D face morphable models "In-The-Wild"
3D Morphable Models (3DMMs) are powerful statistical models of 3D facial shape and texture, and among the state-of-the-art methods for reconstructing facial shape from single images. With the advent of new 3D sensors, many 3D facial datasets have been collected containing both neutral as well as expressive faces. However, all datasets are captured under controlled conditions. Thus, even though powerful 3D facial shape models can be learnt from such data, it is difficult to build statistical texture models that are sufficient to reconstruct faces captured in unconstrained conditions (in-the-wild). In this paper, we propose the first, to the best of our knowledge, in-the-wild 3DMM by combining a powerful statistical model of facial shape, which describes both identity and expression, with an in-the-wild texture model. We show that the employment of such an in-the-wild texture model greatly simplifies the fitting procedure, because there is no need to optimise with regards to the illumination parameters. Furthermore, we propose a new fast algorithm for fitting the 3DMM in arbitrary images. Finally, we have captured the first 3D facial database with relatively unconstrained conditions and report quantitative evaluations with state-of-the-art performance. Complementary qualitative reconstruction results are demonstrated on standard in-the-wild facial databases
GANFIT: Generative adversarial network fitting for high fidelity 3D face reconstruction
In the past few years, a lot of work has been done to- wards reconstructing the 3D facial structure from single images by capitalizing on the power of Deep Convolutional Neural Networks (DCNNs). In the most recent works, differentiable renderers were employed in order to learn the relationship between the facial identity features and the parameters of a 3D morphable model for shape and texture. The texture features either correspond to components of a linear texture space or are learned by auto-encoders directly from in-the-wild images. In all cases, the quality of the facial texture reconstruction of the state-of-the-art methods is still not capable of modeling textures in high fidelity. In this paper, we take a radically different approach and harness the power of Generative Adversarial Networks (GANs) and DCNNs in order to reconstruct the facial texture and shape from single images. That is, we utilize GANs to train a very powerful generator of facial texture in UV space. Then, we revisit the original 3D Morphable Models (3DMMs) fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. We demonstrate excellent results in photorealistic and identity preserving 3D face reconstructions and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details
Towards a complete 3D morphable model of the human head
Three-dimensional Morphable Models (3DMMs) are powerful statistical tools for
representing the 3D shapes and textures of an object class. Here we present the
most complete 3DMM of the human head to date that includes face, cranium, ears,
eyes, teeth and tongue. To achieve this, we propose two methods for combining
existing 3DMMs of different overlapping head parts: i. use a regressor to
complete missing parts of one model using the other, ii. use the Gaussian
Process framework to blend covariance matrices from multiple models. Thus we
build a new combined face-and-head shape model that blends the variability and
facial detail of an existing face model (the LSFM) with the full head modelling
capability of an existing head model (the LYHM). Then we construct and fuse a
highly-detailed ear model to extend the variation of the ear shape. Eye and eye
region models are incorporated into the head model, along with basic models of
the teeth, tongue and inner mouth cavity. The new model achieves
state-of-the-art performance. We use our model to reconstruct full head
representations from single, unconstrained images allowing us to parameterize
craniofacial shape and texture, along with the ear shape, eye gaze and eye
color.Comment: 18 pages, 18 figures, submitted to Transactions on Pattern Analysis
and Machine Intelligence (TPAMI) on the 9th of October as an extension paper
of the original oral CVPR paper : arXiv:1903.0378
3D head morphable models and beyond: algorithms and applications
It has been more than 20 year since the introduction of 3D morphable models (3DMM) in the computer vision literature. They were proposed as a face representation based on principal components analysis for the task of image analysis, photorealist-manipulation, and 3D reconstruction from single images. Even so, to this date, the applications of such models are limited by a number of factors.
Firstly, training correctly 3DMMs require a vast amount of 3D data that most of the times are not publicly available to the research community due to increasingly stringent data protection regulations. Hence, it is extremely difficult to combine and enrich multiple attributes of the human face/head without the initial 3D images. Additionally, many 3DMMs utilize different templates that describe distinct parts of the human face/head (\ie~face, cranium, ears, eyes) that partly overlap with each other and capture statistical variations which are extremely difficult to incorporate into one single universal morphable model. Moreover, despite the increasing level of detail in the 3D face reconstruction from in-the-wild images, mainly attributed to recent advancements in deep learning, non of the available methods in the literature deal with the human tongue which is important for speech dynamics and improves the realness of the oral cavity. Finally, there is limited work on 3D facial geometric enchantments and translations from different capturing systems due to extremely limited availability of 3D dasasets tailored for this task.
This thesis aims at tackling these shortcomings in all four domains.
A novel approach on how to combine and enrich existing 3DMMs without the underline raw data is proposed. We introduce two methods for solving this problem: i. use a regressor to complete missing parts of one model using the other, ii. use a Gaussian Process
framework to blend covariance matrices from multiple models. We show case our approach by combining existing face and head 3DMMs with different templates and statistical variations.
Furthermore, we introduce to the research community the first Universal Head Model (UHM) which holds important statistical variation across all key structures of the human head that have an important contribution to to the appearance and identity of a person. We later show case how this model is used to create full head appearances from single in-the-wild images, thus making significant improvements toward the step of realist human head digitization from data-deficient sources.
Additionally, we present the first method that accurately reconstructs the human tongue from single images by utilizing a novel generative framework which models directly the highly deformable surface of the human tongue and seamlessly merges it with our universal head model for more realist representations of the oral cavity dynamics.
Lastly, in this thesis, it is presented a novel generative pipeline capable of converting and enhancing low to high quality 3D facial scans. This will potentially aid depth sensor applications by increasing the quality of the output data while maintaining a low cost. It is also shown that the proposed framework can be extended to handle translations between various expressions on demand.Open Acces
FitMe: Deep Photorealistic 3D Morphable Model Avatars
In this paper, we introduce FitMe, a facial reflectance model and a
differentiable rendering optimization pipeline, that can be used to acquire
high-fidelity renderable human avatars from single or multiple images. The
model consists of a multi-modal style-based generator, that captures facial
appearance in terms of diffuse and specular reflectance, and a PCA-based shape
model. We employ a fast differentiable rendering process that can be used in an
optimization pipeline, while also achieving photorealistic facial shading. Our
optimization process accurately captures both the facial reflectance and shape
in high-detail, by exploiting the expressivity of the style-based latent
representation and of our shape model. FitMe achieves state-of-the-art
reflectance acquisition and identity preservation on single "in-the-wild"
facial images, while it produces impressive scan-like results, when given
multiple unconstrained facial images pertaining to the same identity. In
contrast with recent implicit avatar reconstructions, FitMe requires only one
minute and produces relightable mesh and texture-based avatars, that can be
used by end-user applications.Comment: Accepted at CVPR 2023, project page at https://lattas.github.io/fitme
, 17 pages including supplementary materia